Nezzle: an interactive and programmable visualization of biological networks in Python

Abstract Summary High-quality visualization of biological networks often requires both manual curation for proper alignment and programming to map external data to the graphical components. Nezzle is a network visualization software written in Python, which provides programmable and interactive interfaces for facilitating both manual and automatic curation of the graphical components of networks to create high-quality figures. Availability and implementation Nezzle is an open-source project under MIT license and is available from https://github.com/dwgoon/nezzle. Supplementary information Supplementary data are available at Bioinformatics online.


Supplementary Figures
. A common workflow in Nezzle. Visualization in Nezzle is designed primarily to be performed through code execution. The first phase of a common workflow in Nezzle is code execution, in which a network is created/loaded and the overall positions of nodes and edges are determined. After that, styling graphic components can be automated by reflecting external data. Users can also reposition some graphic components after code execution, but this step can be skipped. Finally, the network graphics is stored in image file formats such as PNG, JPG, or JSON file format defined by Nezzle.
Supplementary Figure S2. The callback mechanism for executing a user-defined code in Nezzle. Any Python module and package can be a plug-in for extending the functionality of Nezzle. Nezzle calls a userdefined callback function, "def update(nav, net)". In other words, "update" function corresponds to "main" function in C/C++/Java. "net" is a network data structure that contains the graphic components of nodes, edges and labels in the network, and "nav" is a tree view widget that manages the network entries in the navigation GUI.

Technology stack
Nezzle is written in Python and it depends on NumPy, NetworkX, and Python bindings for Qt ( Figure N1).
The geometric calculations for network visualization are implemented through the efficient operations of NumPy, and the data structures of networks are implemented using core Python and NetworkX. The GUI and graphics of Nezzle rely heavily on Python bindings for Qt such as PyQt5 and PySide2, which are abstracted by QtPy. Any Python module or package can be introduced into the technology stack, since Nezzle is written in Python. For instance, machine learning and deep learning frameworks such as scikitlearn (Pedregosa, et al., 2011) and PyTorch (Paszke, et al., 2019) can be adopted in Nezzle.

Code execution
Nezzle provides two interfaces for code execution that facilitate rapid prototyping for network visualization: (1) REPL console and (2) code execution panel ( Figure N2). The REPL (read-evaluate-print loop) console can primarily be utilized to explore and understand a network data structure or the functionality of Nezzle.
Users can run code snippets in REPL console for experimental purposes. For example, when users do not remember the names of member variables or functions correctly, they can check using dir() built-in function in the REPL console. The code execution panel allows users to dynamically run their own source codes, and it can be used primarily for styling automation or importing external data. The user-defined source code, also called 'module' in Python, should have update(nav, net) function, which is called by Nezzle when users push the 'Run' button. The parameters of update function, nav and net, are the navigation widget and network object, respectively. The network object has nodes, edges, and labels member variables, and users usually modify the graphic properties of these members. The navigation widget provides append_item() function, and users can add a new network object to the GUI of Nezzle using this function.

Example: Signal flow visualization
Biological cells process any information about internal or external changes through a signaling network. In this process, critical information for cell fate determination such as proliferation or death is transferred through a series of biochemical reactions, which can be defined as 'signal flow' in the signaling. Lee and Cho have developed a signal flow estimation algorithm and found that the algorithm can properly estimate about 60-80% of signaling activity changes to single or dual perturbations for six signaling networks (Lee and Cho, 2018).
A biochemical reaction such as phosphorylation of a protein in signaling networks can be represented by a directed link with a sign. Activation and inhibition of a signaling molecule are denoted as plus and minus signs, respectively. Signal flow can be mathematically defined by the multiplication of the edge weight and the activity of source node. The sign of edge and the sign of signal flow can be same or opposite depending on the source node activity and edge weight . Interactions between signaling molecules may vary depending on experimental conditions such as mutations or drug perturbations. Therefore, we usually consider the change in signal flow for two different conditions ( Figure N7). Figure N7. Change in signal flow for two different conditions. The weight of an edge represents the signaling intensity between source and target nodes, and the activity represents the signaling activity of a node. Signal flow is defined by the multiplication of the edge weight and the activity of source node. Change in signal flow is the difference between the signal flows obtained from two different conditions.
In the previous study, Lee and Cho reconstructed a signaling network structure from the original ODE model of 78 state variables (Borisov, et al., 2009). EGF and insulin are the inputs, and ERK and AKT are the outputs in the signaling network. The control goal is to reduce the activities of ERK and AKT by targeting a combination of signaling molecules in the network. In this example, we visualize the change in signal flow and activity for the perturbation of MEK and PIP3 under the condition of RAS mutation (Listing N7 and Figure N8). Listing N7. An example of visualizing the change in signal flow for two different conditions. # visualize_signal_flow_borisov2009.py import os from os.path import join as pjoin from os.path import dirname, abspath import sys import networkx as nx import numpy as np import pandas as pd from qtpy.QtCore import Qt from qtpy.QtGui import QColor from qtpy.QtGui import QFont import sfa from sfv.visualizers import LinearVisualizer from nezzle.io import write_image from nezzle.utils import reload_modules

Example: Visualization of temporal dynamics
Nezzle makes it easy to visualize the temporal dynamics of biomolecules based on the corresponding network structures. This functionality of Nezzle is supported by the fact that even difficult tasks can be implemented with a few lines of code using a combination of various packages in Python. We provide an example of visualizing the dynamics of two-node negative feedback loop (Listing N8 and Figs. Figure N9- Figure N10). In this example, we adopted scipy (Virtanen, et al., 2020) and moviepy (Zulko, 2015) packages for ODE integration and movie file generation, respectively. scale_width=200, scale_height=200) # end of for create_movie(fpaths, osp.join(dpath, "2nnfl-dynamics.gif")) Figure N9. The dynamics of two-node negative feedback loop in time domain. Figure N10. Snapshots from the dynamics of two-node negative feedback loop.

Example: Development of a layout algorithm with PyTorch
One of the advantages of using Python is that we can take advantage of major deep learning frameworks such as TensorFlow (Martín, et al., 2015) and PyTorch (Paszke, et al., 2019). Deep learning frameworks provide automatic differentiation (autograd) engines with various optimizers. So, optimization problems defined by the derivatives of an objective function can be solved with the autograd. In this example, we demonstrate a network layout algorithm based on the maximization of mean pairwise distances between nodes using PyTorch (Listing N8 and Figs. Figure N11- Figure N12). The objective function of the layout algorithm is defined as follows: , where n and P represent the number of nodes and the positions of nodes, respectively. The objective function means finding the optimal positions of nodes that maximize the mean pairwise distances (MPD).
We can implement this optimization problem by defining the objective function in the forward function of torch.nn.Module. To convert the maximization problem into a minimization problem, we can simply multiply the objective function by -1. All member variables of torch.nn.Parameter in torch.nn.Module are automatically registered as the target parameters of autograd, and an optimizer such as torch.optim.SGD updates the parameters to minimize the objective function (or loss function). In this example, the positions of nodes are treated as the parameters of a neural network. Another advantage of utilizing deep learning frameworks is that we can optimize our algorithms based on GPU-accelerated operations of deep learning frameworks. In this example, we can apply the GPU-accelerated operations of PyTorch by designating the device as "CUDA". Listing N9. An example of developing a layout algorithm with PyTorch. # layout_net_by_pytorch.py import os import os.path as osp from datetime import datetime import numpy as np import torch import torch.nn as nn import torch.optim as optim import torch.nn.functional as F import moviepy.editor as mpy

Example: Iris dataset layout dynamics created with scikit-learn and PyTorch
This example shows how to create a layout dynamics of Iris dataset with scikit-learn (Pedregosa, et al., 2011) and PyTorch (Paszke, et al., 2019). Iris dataset consists of the samples of three Iris species, and each sample has four features: sepal length, sepal width, petal length and petal width in centimeters (Anderson, 1936;Fisher, 1936). We can obtain Iris dataset from scikit-learn, as scikit-learn provides it as an example dataset (sklearn.datasets.load_iris). The samples of Iris dataset can be visualized on the first two principal components (Listing N10, sklearn.decomposition.PCA). In other words, we used the first two principal components to obtain the positions of Iris samples in the two-dimensional space of Nezzle. Figure   N13 shows the Iris samples projected on the principal components.
To understand the characteristics of the layout algorithm introduced in "8. Example: Development of a layout algorithm with PyTorch", we applied the algorithm to the Iris samples on the principal components.
We used the objective function and optimization process as follows: , where n and P represent the number of nodes and the positions of nodes, respectively. The MPD (mean pairwise distances) is the same as that of "8. Example: Development of a layout algorithm with PyTorch". However, the optimization process is divided into two stages: (1) minimization of MPD and (2) maximization of MPD. In the first stage, the Euclidean distances between the sample points were decreased, resulting in only a single point visible ( Figure N14). The objective of the second stage is the opposite of the first stage objective. In the second stage, the distances between the sample points were increased by the optimizer as much as possible, resulting in scattered points like exploding fireworks ( Figure N14). These results imply that the layout algorithm based on the minimization or maximization of MPD does not conserve the original properties of a dataset such as geometric patterns of clusters on the principal components.  create_movie(fpaths_img, osp.join(dpath, "iris-layout-dynamics.gif")) time_stamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S") net.name = "%s (%s)"%(net.name, time_stamp) nav.append_item(net) Figure N13. Principal component view of Iris dataset.
Red, green, and blue represent setosa, versicolor, and virginica, respectively. Figure N14. Snapshots created by the layout dynamics of Iris dataset.